Continuous Speech Recognition at LIMSI

نویسندگان

Lori F. Lamel

Jean-Luc Gauvain

چکیده

This paper presents some of the recent research on speaker-independent continuous speech recognition at LIMSI including efforts in phone and word recognition for both French and English. Evaluation of an HMMbased phone recognizer on a subset of the BREF corpus, gives a phone accuracy of 67.1% with 35 context-independent phone models and 74.2% with 428 context-dependent phone models. The word accuracy is 88% for a 1139 word lexicon and 86% for a 2716 word lexicon, using a word pair grammar with respective perplexities of 101 and 160. Phone recognition is also shown to be effective for language, sex, and speaker identification. The second part of the paper describes the recognizer used for the September-92 Resource Management evaluation test. The HMM-based word recognizer is built by concatenation of the phone models for each word, where each phone model is a 3-state left-to-right HMM with Gaussian mixture observation densities. Separate male and female models are run in parallel. The lexicon is represented with a reduced set of 36 phones so as to permit additional sharing of contexts. Intraand inter-word phonological rules are optionally applied during training and recognition. These rules attempt to account for some of the phonological variations observed in fluent speech. The speaker-independent word accuracy on the Sep92 test data was 95.6%. On the previous test materials which were used for development, the word accuracies are: 96.7% (Jun88), 97.5% (Feb89), 96.7% (Oct89) and 97.4% (Feb91).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recent Activities in Spoken Language Processing at LIMSI

This paper summarizes recent activities at LIMSI in multilingual speech recognition and its applications. While the main goal of speech recognition is to provide a transcription of the speech signal as a sequence of words, the same basic technology serves as the first step in other application areas, such as in automatic systems for information access and for automatic indexation of audiovisual...

متن کامل

Invited Talk: Processing Broadcast Audio For Information Access

This paper addresses recent progress in speaker-independent, large vocabulary, continuous speech recognition, which has opened up a wide range of near and mid-term applications. One rapidly expanding application area is the processing of broadcast audio for information access. At LIMSI, broadcast news transcription systems have been developed for English, French, German, Mandarin and Portuguese...

متن کامل

The 2004 BBN/LIMSI 20xRT English conversational telephone speech recognition system

In this paper we describe the English Conversational Telephone Speech (CTS) recognition system jointly developed by BBN and LIMSI under the DARPA EARS program for the 2004 evaluation conducted by NIST. The 2004 BBN/LIMSI system achieved a word error rate (WER) of 13.5% at 18.3xRT (realtime as measured on Pentium 4 Xeon 3.4 GHz Processor) on the EARS progress test set. This translates into a 22....

متن کامل

Transcribing Broadcast News: The LIMSI Nov96 Hub4 System

In this paper we report on the LIMSI Nov96 Hub4 system for transcription of broadcast news shows. We describe the development work in moving from laboratory read speech data to realworld speech data in order to build a system for the ARPA Nov96 evaluation. Two main problems were addressed to deal with the continuous flow of inhomogenous data. These concern the varied acoustic nature of the sign...

متن کامل